Search CORE

20 research outputs found

Compositionality as an Analogical Process: Introducing ANNE

Author: Blache Philippe
Chersoni Emmanuele
Lenci Alessandro
Rambelli Giulia
Publication venue: place:Stroudsburg
Publication date: 01/01/2022
Field of study

Usage-based constructionist approaches consider language a structured inventory of constructions, form-meaning pairings of different schematicity and complexity, and claim that the more a linguistic pattern is encountered, the more it becomes accessible to speakers. However, when an expression is unavailable, what processes underlie the interpretation? While traditional answers rely on the principle of compositionality, for which the meaning is built word-by-word and incrementally, usage-based theories argue that novel utterances are created based on previously experienced ones through analogy, mapping an existing structural pattern onto a novel instance. Starting from this theoretical perspective, we propose here a computational implementation of these assumptions. As the principle of compositionality has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our framework, inspired by word2vec and computer vision techniques, was evaluated on tasks of generalization from existing vectors

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

LexFr: Adapting the LexIt Framework to Build a Corpus-Based French Subcategorization Lexicon

Author: Lebani Gianluca
Lenci Alessandro
Prévot Laurent
Rambelli Giulia
Publication venue: EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Publication date: 01/01/2016
Field of study

This paper introduces LexFr , a corpus-based French lexical resource built by adapting the framework LexIt , originally developed to describe the combinatorial potential of Italian predicates. As in the original framework, the behavior of a group of target predicates is characterized by a series of syntactic (i.e., subcategorization frames) and semantic (i.e., selectional preferences) statistical information (a.k.a. distributional profiles ) whose extraction process is mostly unsupervised. The first release of LexFr includes information for 2,493 verbs, 7,939 nouns and 2,628 adjectives. In these pages we describe the adaptation process and evaluated the final resource by comparing the information collected for 20 test verbs against the information available in a gold standard dictionary. In the best performing setting, we obtained 0.74 precision, 0.66 recall and 0.70 F-measure.This paper introduces LexFr, a corpus-based French lexical resource built by adapting the framework LexIt, originally developed to describe the combinatorial potential of Italian predicates. As in the original framework, the behavior of a group of target predicates is characterized by a series of syntactic (i.e., subcategorization frames) and semantic (i.e., selectional preferences) statistical information (a.k.a. distributional profiles) whose extraction process is mostly unsupervised. The first release of LexFr includes information for 2,493 verbs, 7,939 nouns and 2,628 adjectives. In these pages we describe the adaptation process and evaluated the final resource by comparing the information collected for 20 test verbs against the information available in a gold standard dictionary. In the best performing setting, we obtained 0.74 precision, 0.66 recall and 0.70 F-measure

HAL AMU

Archivio della Ricerca - Università di Pisa

HAL Descartes

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Hal-Diderot

Event knowledge in large language models: the gap between the impossible and the unlikely

Author: Chersoni Emmanuele
Chowdhury Zawad
Fedorenko Evelina
Ivanova Anna A.
Kauf Carina
Lenci Alessandro
Rambelli Giulia
She Jingyuan Selena
Publication venue
Publication date: 26/10/2023
Field of study

Word co-occurrence patterns in language corpora contain a surprising amount of conceptual knowledge. Large language models (LLMs), trained to predict words in context, leverage these patterns to achieve impressive performance on diverse semantic tasks requiring world knowledge. An important but understudied question about LLMs' semantic abilities is whether they acquire generalized knowledge of common events. Here, we test whether five pre-trained LLMs (from 2018's BERT to 2023's MPT) assign higher likelihood to plausible descriptions of agent-patient interactions than to minimally different implausible versions of the same event. Using three curated sets of minimal sentence pairs (total n=1,215), we found that pre-trained LLMs possess substantial event knowledge, outperforming other distributional language models. In particular, they almost always assign higher likelihood to possible vs. impossible events (The teacher bought the laptop vs. The laptop bought the teacher). However, LLMs show less consistent preferences for likely vs. unlikely events (The nanny tutored the boy vs. The boy tutored the nanny). In follow-up analyses, we show that (i) LLM scores are driven by both plausibility and surface-level sentence features, (ii) LLM scores generalize well across syntactic variants (active vs. passive constructions) but less well across semantic variants (synonymous sentences), (iii) some LLM errors mirror human judgment ambiguity, and (iv) sentence plausibility serves as an organizing dimension in internal LLM representations. Overall, our results show that important aspects of event knowledge naturally emerge from distributional linguistic patterns, but also highlight a gap between representations of possible/impossible and likely/unlikely events.Comment: The two lead authors have contributed equally to this wor

arXiv.org e-Print Archive

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

Author: Agerri Rodrigo
Aliprandi Carlo
Alkhalifa Rabab
Alzetta Chiara
Angel Jason
Anselmi Guido
Appiah Balaji Nitin Nikamanth
Aroyehun Segun Taofeek
Artigas Herold Maria Fernanda
Attanasio Giuseppe
Attardi Giuseppe
Badryzlova Yulia
Bai Yang
Baldissin Gioia
Ballarè Silvia
Barrón-Cedeño Alberto
Bartle Anna-Sophie
Basile Pierpaolo
Basile Valerio
Basili Roberto
Belotti Federico
Bennici Mauro
Bharathi B.
Bhuvana J.
Bianchi Federico
Bisconti Elia
Bolanos Luis
Bondielli Alessandro
Bosco Cristina
Breazzano Claudia
Brivio Matteo
Brunato Dominique
Cafagna Michele
Caputo Annalina
Caselli Tommaso
Cassotti Pierluigi
Castañeda Enrique
Castro Castro Daniel
Centeno Roberto
Cercel Dumitru-Clementin
Cerruti Massimo
Chandrabose Aravindan
Chesi Cristiano
Chiarello Filippo
Cignarella Alessandra Teresa
Cimino Andrea
Comandini Gloria
Croce Danilo
Dai Hongbing
Dascalu Mihai
Dell’Orletta Felice
Delmonte Rodolfo
Deng Tao
De Francesco Nazareno
De Martino Graziella
De Mattei Lorenzo
Di Buccio Emanuele
Di Maro Maria
di Nuovo Elisa
Di Rosa Emanuele
dos S.R. da Silva Adriano
Durante Alberto
El Abassi Samer
Espinosa María S.
Fabrizi Samuel
Fantoni Gualtiero
Ferilli Stefano
Ferraccioli Federico
Fersini Elisabetta
Finos Livio
Fiorucci Stefano
Fontana Michele
Frenda Simona
Gambino Giuseppe
Gatt Albert
Gelbukh Alexander
Giorgi Giulia
Giorgioni Simone
Girardi Paolo
Goria Eugenio
Gregori Lorenzo
Hoffmann Julia
Iacono Maria
Iovine Andrea
Izzi Giovanni Luca
Jimenez Sergio
Kaiser Jens
Kayalvizhi S.
Kivlichan Ian
Klaus Svea
Koceva Frosina
Kovács György
Kruschwitz Udo
Labadie Tamayo Roberto
Lai Mirko
Laicher Severin
Lapesa Gabriella
Lavergne Eric
Lebani Gianluca E.
Lebani Gianluca E.
Lees Alyssa
Lenci Alessandro
Leonardelli Elisa
Li Hongling
Liakata Maria
Lovetere Marco
Madonna Domenico
Massidda Riccardo
Mattei Lorenzo De
Mauri Caterina
Mele Francesco
Melucci Massimo
Menini Stefano
Miaschi Alessio
Miliani Martina
Moggio Alessio
Montagnani Matteo
Montefinese Maria
Montemagni Simonetta
Monti Johanna
Moraca Maurizio
Moretti Giovanni
Morra Simone
Murphy Killian
Muti Arianna
Nakov Preslav
Nisioi Sergiu
Nissim Malvina
Nozza Debora
Occhipinti Daniela
Ortega Bueno Reynier
Ou Xiaozhi
Palmonari Matteo
Parizzi Andrea
Pascucci Antonio
Passaro Lucia C.
Pastor Eliana
Patti Viviana
Pirrone Roberto
Polignano Marco
Politi Marcello
Pont Mattia Da
Pražák Ondřej
Proisl Thomas
Puccetti Giovanni
Přibáň Pavel
Radicioni Daniele P.
Rama Ilir
Rambelli Giulia
Ravelli Andrea Amelio
Rodrigo Alvaro
Rodriguez-Diaz Carlos A.
Rodriguez Cisnero Mariano Jason
Roman Norton T.
Roman Norton Trevisan
Rossmann Daniela
Rosso Paolo
Rotaru Armand Stefan
Rubino Edoardo
Russo Irene
Sabella Gianluca
Saini Rajkumar
Salman Samir
Sangati Federico
Sanguinetti Manuela
Sarti Gabriele
Schlechtweg Dominik
Schulte im Walde Sabine
Sciandra Andrea
Setpal Jinen
Siciliani Lucia
Solari Dario
Sorensen Jeffrey
Sorgente Antonio
Sprugnoli Rachele
Stranisci Marco
Tamburini Fabio
Taylor Stephen
Tesei Andrea
Thenmozhi D.
Tonelli Sara
Torre Ilaria
Tsakalidis Adam
Varvara Rossella
Venturi Giulia
Vettigli Giuseppe
Vlad George-Alexandru
Wang Benyou
Zaharia George-Eduard
Zamparelli Roberto
Zubiaga Arkaitz
Publication venue: 'OpenEdition'
Publication date: 11/05/2021
Field of study

Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)

OpenEdition

Intégration des approches distributionnelles et constructionnelles : vers un nouveau modèle de compréhension du langage

Author: Rambelli Giulia
Publication venue
Publication date: 14/09/2022
Field of study

Cette thèse explore deux questions de recherche situées dans le paradigme constructionniste basé sur les usages. D'une part, nous proposons de caractériser le contenu sémantique des constructions en intégrant les représentations vectorielles propre des modelés distributionnels aux descriptions linguistiques des Grammaires de Construction. Nous abordons en plus une question encore ouverte : Quels principes cognitifs et linguistiques régissent la compréhension du langage? De nombreux travaux suggèrent que l'interprétation des phrases alterne entre stratégies compositionnelles et non-compositionnelles. Bien qu'il soit reconnu que les idiomes sont lus très rapidement, nous avançons que les expressions littérales, si suffisamment fréquentes, sont traitées de la même manière. Grâce au paradigme du Self-Paced Reading, nous avons testé les temps de lecture de phrases idiomatiques, littérales très fréquentes et littérales peu fréquentes : les eﬀets de facilitation se produisent également lors du traitement de phrases fréquentes et composées. D'ailleurs, nous soutenons que plusieurs processus systématiques de productivité du langage sont explicables par des inférences analogiques : les phrases nouvelles sont produites et comprises “à la volée” par analogie avec des expressions familières. Nous présentons ANNE, un réseau neuronal qui simule la construction de nouveaux vecteurs de phrases comme un processus analogue. Cet système a été évalué sur sa capacité à généraliser à partir de vecteurs existants. Cette thèse représente une contribution à la clarification de la littérature complexe sur la compréhension du langage et à l’ouverture de nouvelles études expérimentales et computationnelles.This thesis explores two lines of research framed within the usage-based constructionist paradigm. On one side, we investigate how to ground the semantic content of constructions in language use; we propose integrating vector representations used in Distributional Semantic Models into linguistic descriptions of Construction Grammar.Besides, we address a still open question: What cognitive and linguistic principles govern language comprehension? Considerable evidence suggests that interpretation alternates compositional-incremental- and noncompositional(global) strategies. Although it is recognized that idioms are fast to process, we claim that even literal expressions, if frequent enough, are processed similarly. Using the Self-Paced Reading paradigm, we tested reading times of idiomatic and literal high-frequent and low-frequent verb-noun phrases; facilitation eﬀects also occur when reading frequent and yet compositional expressions.Concurrently, we claim that systematic processes of language productivity are mainly explainable by analogical inferences rather than sequential compositional operations: novel expressions are produced and understood “on the fly” by analogy with familiar ones. As the compositionality principle has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our ANNE, inspired by word2vec and computer vision techniques, was evaluated on its ability to generalize from existing vectors.Overall, we hope this work could clarify the complex literature on language comprehension and pave the way for new experimental and computational studies.In questa tesi, esploriamo due linee di ricerca inquadrate nel paradigma costruzionista basato sull’uso. Da un lato, indaghiamo su come elineareil contenuto semantico delle costruzioni, proponendo di integrare le rappresentazioni vettoriali usate nei Modelli Semantico-Distribuzionali nelledescrizioni linguistiche della Grammatica delle Costruzioni. Dall’altro, ci domandiamo: Quali principi cognitivi e linguistici governano la comprensione del linguaggio? Numerose prove suggeriscono che l’interpretazione alterna strategie composizionali e non-composizionali. Data l’ipotesi che gli idiomi siano veloci da elaborare, assumiamo che anche espressioni letterali, se abbastanza frequenti, siano elaborate allo stesso modo. Usando il paradigma « Self-Paced Reading », abbiamo testato i tempi di lettura di espressioni idiomatiche e sintagmi letterali ad alta e a bassa frequenza, osservando che gli effetti di facilitazione si verificano anche per espressioni frequenti sebbene composizionali. In parallelo, argomentiamo che i processi sistematici di produttività siano principalmente spiegabili con inferenze analogiche: espressioni nuove sonoprodotte e comprese ‘al volo’ per analogia con espressioni familiari. Come il principio di composizionalità è stato usato per generare embedding di sintagmi, presentiamo una rete neurale che simuli la costruzione di tali vettori come un processo analogico. La rete ANNE, ispirata a ord2vece alle tecniche di visione artificiale, è stata valutata sulla sua capacità di generalizzare da vettori esistenti. Nel complesso, speriamo che questo lavoro chiarisca la complessa letteratura sulla comprensione del linguaggio e apra la strada a nuovi studi sperimentali e computazionali

Theses.fr

UDLex: un lessico multilingue della struttura argomentale

Author: RAMBELLI GIULIA
Publication venue: 'Pisa University Press'
Publication date: 18/04/2018
Field of study

Questa tesi presenta UDLex, un sistema computazionale per la realizzazione di una risorsa lessicale multilingue. Basandosi sulla versatilità dello schema di annotazione delle Universal Dependencies (Nivre, 2015), tale sistema estrae automaticamente le proprietà della struttura argomentale verbale per diverse lingue, adattando il framework LexIt (Lenci et al., 2012), originariamente realizzato per descrivere la struttura argomentale dei predicati italiani. Il lessico multilingua finale prevede la possibilità di esplorare le strutture argomentali di verbi semanticamente simili in lingue diverse. Si presenta anche una metodologia per realizzare in maniera automatica un lessico multilingue della struttura argomentale a partire dai database costruiti con UDLex, al fine di supportare lo studio contrastivo dei pattern sintattici che realizzano verbi semanticamente simili in lingue differenti

Electronic Thesis and Dissertation Archive - Università di Pisa

Integrating Distributional and Constructional Approaches: Towards a new Model of Language Comprehension

Author: RAMBELLI GIULIA
Publication venue: 'Pisa University Press'
Publication date: 14/09/2025
Field of study

In this dissertation, we explore two lines of research framed within the usage-based constructionist paradigm. On one side, we investigate how to ground the semantic content of constructions in language use; we propose integrating vector representations used in Distributional Semantic Models into linguistic descriptions of Construction Grammar. Besides, we address a still open question: What cognitive and linguistic principles govern language comprehension? Considerable evidence suggests that interpretation alternates compositional (incremental) and noncompositional (global) strategies. Although it is recognized that idioms are fast to process, we claim that even literal expressions, if frequent enough, are processed in the same way. Using the Self-Paced Reading paradigm, we tested reading times of idiomatic and literal high-frequent and low-frequent verb-noun phrases, observing that facilitation eﬀects also occur when processing frequent and yet compositional expressions. Concurrently, we claim that systematic processes of language productivity are mainly explainable by analogical inferences rather than sequential compositional operations: novel expressions are produced and understood `on the fly' by analogy with familiar ones. As the principle of compositionality has been used to generate distributional representations of phrases, we propose a neural network simulating the construction of phrasal embedding as an analogical process. Our ANNE, inspired by word2vec and computer vision techniques, was evaluated on its ability to generalize from existing vectors. Overall, we hope this work could clarify the complex literature on language comprehension and pave the way for new experimental and computational studie

Electronic Thesis and Dissertation Archive - Università di Pisa

Letteratura per l’infanzia a tematica LGBTQ+ a scuola: risultati di una ricerca empirica tramite questionario rivolto ai genitori

Author: Rambelli Giulia Alis
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 19/07/2022
Field of study

Questa tesi è nata dal desiderio di capire qual è l’opinione dei genitori riguardo alla presenza di personaggi e tematiche LGBTQ+ nei libri per bambini e bambine. Al contrario di paesi come gli Stati Uniti, la Gran Bretagna e la Francia, in Italia i libri per bambini.e che affrontano il tema dell’orientamento sessuale e dell’identità di genere sono poco numerosi. Nonostante gli sforzi fatti da alcune case editrici e da diverse associazioni che già da tempo portano avanti dei progetti all’interno delle scuole italiane per educare alle differenze, tali tematiche incontrano ancora forti resistenze. La carenza di libri per l’infanzia a tematica LGBTQ+ e il fatto che il sistema educativo italiano non preveda la lettura di opere volte a educare i.le più piccoli.e al rispetto delle diversità sono in parte dovuti alla resistenza di ambienti e associazioni ultraconservatrici che denunciano la cosiddetta “teoria del gender”. Dopo aver tracciato un quadro dell'offerta editoriale europea e italiana su queste tematiche, l'elaborato si focalizza su una ricerca empirica basata su questionario anonimo. In particolare vengono approfondite l'elaborazione del questionario, centrato sull'opportunità (o meno) di leggere a scuola libri con personaggi LGBTQ+ e rivolto a genitori, e le difficoltà di somministrazione del questionario stesso. Infine è proposta un'analisi dei dati raccolti, analizzando le risposte fornite dai genitori che hanno partecipato

AMS Tesi di Laurea

CogALex-V Shared Task: ROOT18

Author: Chersoni Emmanuele
Rambelli Giulia
Santus Enrico
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

International audienceIn this paper, we describe ROOT 18, a classifier using the scores of several unsupervised dis-tributional measures as features to discriminate between semantically related and unrelated words, and then to classify the related pairs according to their semantic relation (i.e. synonymy, antonymy, hypernymy, part-whole meronymy). Our classifier participated in the CogALex-V Shared Task, showing a solid performance on the first subtask, but a poor performance on the second subtask. The low scores reported on the second subtask suggest that distributional measures are not sufficient to discriminate between multiple semantic relations at once

arXiv.org e-Print Archive

HAL AMU

PolyU Institutional Repository

Archivio della Ricerca - Università di Pisa

A Self-Paced Reading Study on Processing Constructions with Different Degrees of Compositionality

Author: Blache Philippe
Lenci Alessandro
Rambelli Giulia
Publication venue: HAL CCSD
Publication date: 24/03/2022
Field of study

International audienc

HAL AMU